{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "e239cc79",
   "metadata": {},
   "source": [
    "# How to use a time-weighted vector store retriever\n",
    "\n",
    "This retriever uses a combination of semantic similarity and a time decay.\n",
    "\n",
    "The algorithm for scoring them is:\n",
    "\n",
    "```\n",
    "semantic_similarity + (1.0 - decay_rate) ^ hours_passed\n",
    "```\n",
    "\n",
    "Notably, `hours_passed` refers to the hours passed since the object in the retriever **was last accessed**, not since it was created. This means that frequently accessed objects remain \"fresh\".\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "97e74400",
   "metadata": {},
   "outputs": [],
   "source": [
    "from datetime import datetime, timedelta\n",
    "\n",
    "import faiss\n",
    "from langchain.retrievers import TimeWeightedVectorStoreRetriever\n",
    "from langchain_community.docstore import InMemoryDocstore\n",
    "from langchain_community.vectorstores import FAISS\n",
    "from langchain_core.documents import Document\n",
    "from langchain_openai import OpenAIEmbeddings"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "89635236",
   "metadata": {},
   "source": [
    "## Low decay rate\n",
    "\n",
    "A low `decay rate` (in this, to be extreme, we will set it close to 0) means memories will be \"remembered\" for longer. A `decay rate` of 0 means memories never be forgotten, making this retriever equivalent to the vector lookup.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "d3a1778d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define your embedding model\n",
    "embeddings_model = OpenAIEmbeddings()\n",
    "# Initialize the vectorstore as empty\n",
    "embedding_size = 1536\n",
    "index = faiss.IndexFlatL2(embedding_size)\n",
    "vectorstore = FAISS(embeddings_model, index, InMemoryDocstore({}), {})\n",
    "retriever = TimeWeightedVectorStoreRetriever(\n",
    "    vectorstore=vectorstore, decay_rate=0.0000000000000000000000001, k=1\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "408fc114",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['c3dcf671-3c0a-4273-9334-c4a913076bfa']"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "yesterday = datetime.now() - timedelta(days=1)\n",
    "retriever.add_documents(\n",
    "    [Document(page_content=\"hello world\", metadata={\"last_accessed_at\": yesterday})]\n",
    ")\n",
    "retriever.add_documents([Document(page_content=\"hello foo\")])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "8a5ed9ca",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[Document(page_content='hello world', metadata={'last_accessed_at': datetime.datetime(2023, 12, 27, 15, 30, 18, 457125), 'created_at': datetime.datetime(2023, 12, 27, 15, 30, 8, 442662), 'buffer_idx': 0})]"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# \"Hello World\" is returned first because it is most salient, and the decay rate is close to 0., meaning it's still recent enough\n",
    "retriever.get_relevant_documents(\"hello world\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d8bc4f96",
   "metadata": {},
   "source": [
    "## High decay rate\n",
    "\n",
    "With a high `decay rate` (e.g., several 9's), the `recency score` quickly goes to 0! If you set this all the way to 1, `recency` is 0 for all objects, once again making this equivalent to a vector lookup.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "e588d729",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define your embedding model\n",
    "embeddings_model = OpenAIEmbeddings()\n",
    "# Initialize the vectorstore as empty\n",
    "embedding_size = 1536\n",
    "index = faiss.IndexFlatL2(embedding_size)\n",
    "vectorstore = FAISS(embeddings_model, index, InMemoryDocstore({}), {})\n",
    "retriever = TimeWeightedVectorStoreRetriever(\n",
    "    vectorstore=vectorstore, decay_rate=0.999, k=1\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "43b4afb3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['eb1c4c86-01a8-40e3-8393-9a927295a950']"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "yesterday = datetime.now() - timedelta(days=1)\n",
    "retriever.add_documents(\n",
    "    [Document(page_content=\"hello world\", metadata={\"last_accessed_at\": yesterday})]\n",
    ")\n",
    "retriever.add_documents([Document(page_content=\"hello foo\")])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "0677113c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[Document(page_content='hello foo', metadata={'last_accessed_at': datetime.datetime(2023, 12, 27, 15, 30, 50, 57185), 'created_at': datetime.datetime(2023, 12, 27, 15, 30, 44, 720490), 'buffer_idx': 1})]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# \"Hello Foo\" is returned first because \"hello world\" is mostly forgotten\n",
    "retriever.get_relevant_documents(\"hello world\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c8b0075a",
   "metadata": {},
   "source": [
    "## Virtual time\n",
    "\n",
    "Using some utils in LangChain, you can mock out the time component.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "0b4188e7",
   "metadata": {},
   "outputs": [],
   "source": [
    "import datetime\n",
    "\n",
    "from langchain_core.utils import mock_now"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "95d55764",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[Document(page_content='hello world', metadata={'last_accessed_at': MockDateTime(2024, 2, 3, 10, 11), 'created_at': datetime.datetime(2023, 12, 27, 15, 30, 44, 532941), 'buffer_idx': 0})]\n"
     ]
    }
   ],
   "source": [
    "# Notice the last access time is that date time\n",
    "with mock_now(datetime.datetime(2024, 2, 3, 10, 11)):\n",
    "    print(retriever.get_relevant_documents(\"hello world\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9a6da4c6",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
